AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Instruction Model

# Multimodal Instruction Model

Phi 4 Mm Inst Asr Singlish
MIT
A multimodal speech recognition model optimized for Singapore English, fine-tuned based on Microsoft's Phi-4 multimodal instruction model, significantly improving recognition of Singapore English's unique phonetic features.
Audio-to-Text Transformers Supports Multiple Languages
P
mjwong
61
0
Typhoon2 Qwen2vl 7b Vision Instruct
Apache-2.0
Typhoon2-Vision is a Thai-supported visual language model capable of processing image and video inputs, specifically optimized for image-based applications.
Text-to-Image Transformers Supports Multiple Languages
T
scb10x
793
11
Xgen Mm Phi3 Mini Instruct Singleimg R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models developed by Salesforce AI Research. It is improved based on the successful design of the BLIP series, providing more powerful multimodal processing capabilities.
Image-to-Text Safetensors English
X
Salesforce
313
15
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase